Polonium: Tera-Scale Graph Mining and Inference for Malware Detection
نویسندگان
چکیده
We present Polonium, a novel Symantec technology that detects malware through large-scale graph inference. Based on the scalable Belief Propagation algorithm, Polonium infers every file’s reputation, flagging files with low reputation as malware. We evaluated Polonium with a billion-node graph constructed from the largest file submissions dataset ever published (60 terabytes). Polonium attained a high true positive rate of 87% in detecting malware; in the field, Polonium lifted the detection rate of existing methods by 10 absolute percentage points. We detail Polonium’s design and implementation features instrumental to its success. Polonium has served 120 million people and helped answer more than one trillion queries for file reputation.
منابع مشابه
Data Mining Meets HCI: Making Sense of Large Graphs
We have entered the age of big data. Massive datasets are now common in science, government and enterprises. Yet, making sense of these data remains a fundamental challenge. Where do we start our analysis? Where to go next? How to visualize our findings? We answers these questions by bridging Data Mining and HumanComputer Interaction (HCI) to create tools for making sense of graphs with billion...
متن کاملMalware Detection using Classification of Variable-Length Sequences
In this paper, a novel method based on the graph is proposed to classify the sequence of variable length as feature extraction. The proposed method overcomes the problems of the traditional graph with variable length of data, without fixing length of sequences, by determining the most frequent instructions and insertion the rest of instructions on the set of “other”, save speed and memory. Acco...
متن کاملSemantic Malware Detection by Deploying Graph Mining
Today malware is a serious threat to our society. Several researchers are studying detection and mitigation of malware threats. On the other hand malware authors try to use obfuscation techniques for evading detection. Unfortunately usual approach (e.g., antivirus software) use signature based method which can easily be evaded. For addressing these shortcomings dynamic methods have been introdu...
متن کاملMining Billion-Scale Graphs: Patterns and Algorithms
Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algori...
متن کاملDetection of Breast Cancer Progress Using Adaptive Nero Fuzzy Inference System and Data Mining Techniques
Prediction, diagnosis, recovery and recurrence of the breast cancer among the patients are always one of the most important challenges for explorers and scientists. Nowadays by using of the bioinformatics sciences, these challenges can be eliminated by using of the previous information of patients records. In this paper has been used adaptive nero fuzzy inference system and data mining techniqu...
متن کامل